Matching

Lalonde (1986)

  • National Supported Work experiment: qualified applicants randomly assigned to either a treatment or control condition. Treatment group received a guaranteed job for 9-18 months with additional counseling and support
  • Very positive results!
(1)
(Intercept)4554.802 ***
(408.046)   
treat1794.343 ** 
(632.854)   
N445        
R20.018    
logLik-4542.741    
AIC9091.482    
*** p < 0.001; ** p < 0.01; * p < 0.05.

Lalonde (1986)

Lalonde (1986)

  • Lalonde swapped the control group for a sample from the general population, and demonstrated that a similarly constructed study with observational data would fail to find the correct result, even after controlling for observed characteristics
vs experimental controlvs population
(Intercept)4554.802 ***6984.170 ***
(408.046)   (360.710)   
treat1794.343 ** -635.026    
(632.854)   (657.137)   
N445        614        
R20.018    0.002    
logLik-4542.741    -6346.371    
AIC9091.482    12698.742    
*** p < 0.001; ** p < 0.01; * p < 0.05.

Lalonde (1986)

(he tried a bunch of stuff, none of it worked)

Lalonde (1986)

Candidates for job training were unlike the general population. They weren’t even especially similar to people with similar income, education, age, race, etc.

The treatment group differed from the general population in their expected potential income. So a naive difference of means would be a combination of the impact of job training and the difference in potential outcomes with the general population.

\[\underbrace{E[Y_i(1) - Y_i(0)|D_i=1]}_\text{job training effect} + \underbrace{E[Y_i(0)|D_i=1] - E[Y_i(0)|D_i=0]}_\text{differences between applicants and non-applicants}\]

Observed Imbalance

Variable N population, N = 429 control, N = 260 treated, N = 185
age 874 25 (19, 35) 24 (19, 28) 25 (20, 29)
educ 874 11 (9, 12) 10 (9, 11) 11 (9, 12)
race 874
black 87 (20%) 215 (83%) 156 (84%)
hispan 61 (14%) 28 (11%) 11 (5.9%)
white 281 (66%) 17 (6.5%) 18 (9.7%)
married 874 220 (51%) 40 (15%) 35 (19%)
nodegree 874 256 (60%) 217 (83%) 131 (71%)
income 1974 874 2,547 (0, 9,277) 0 (0, 139) 0 (0, 1,291)
income 1975 874 1,087 (0, 3,881) 0 (0, 650) 0 (0, 1,817)
1 Median (IQR); n (%)

Imbalance

Control variables probably can’t fix this.

  • large observed differences often imply large unobserved differences
  • even if all confounding is observed and controlled, regression controls assume a linear relationship between the confounders and the treatment effects.

Imbalance

Applied for program \(\rightarrow\) Earnings

Applied for program \(\leftarrow\) Education \(\rightarrow\) Earnings

mrdag E Education D Applied for program E->D Y Earnings E->Y D->Y

Imbalance

dataFunction<-function(){
  N<-1000
  education<-rpois(N, 3) # education
  D<-rbinom(N, 1, arm::invlogit(education)) # applied to job program
  Y_1<- 2 + education^2 +  rnorm(100) # potential outcome for training
  Y_0<- 0 + education^2 +  rnorm(100) # potential outcome for no training
  ATE<-mean(Y_1-Y_0) # the true effect 
  Y<-ifelse(D==1, Y_1, Y_0) # the observed outcomes 
  df<-tibble("income" =Y, Y_0=Y_0, Y_1=Y_1,
             'treated' = factor(D, labels=c("Control", "Treated")), 
             education)
  return(df)
}

set.seed(9999)
data<-dataFunction()
ggplot(data, aes(x=income,y=treated)) + 
  stat_halfeye() +
  theme_bw() +
  xlab("Potential Outcome if Y(1)")
mean(data$Y_1 - data$Y_0)
[1] 1.870983

Imbalance

Notably, since the education confounder has a non-linear effect on income, we don’t actually get a correct estimate even if we control for it:

tinytable_ijmitv75oddcwxlcv71k
no controls ols + control
actual ATE = 1.871
(Intercept) 2.721 -5.448
(1.187) (0.411)
treatedTreated 12.322 -2.020
(1.259) (0.455)
education 7.084
(0.080)
Num.Obs. 1000 1000

Matching

The experimental ideal allows us to assume this:

\[Y_i(0), Y_i(1), X_i \mathrel{\unicode{x2AEB}} D_i\] But we could theoretically get the average treatment effect if we can assume that the assignment is conditionally independent of expected outcomes.

\[Y_i(0), Y_i(1) \mathrel{\unicode{x2AEB}} D_i | X_i\]Both matching and regression methods attempt to meet this more relaxed condition, but the former can potentially do so more flexibly.

Exact matching

  • Match groups with the exact same values, discarding levels of the predictor without at least one treated and one control unit.
  • re-weight each strata so that the treatment and control groups are similar

Exact matching

Exact matching

Exact matching

Exact matching on education, however, actually does get us much closer to the true average treatment effect

ate<-mean(data$Y_1 - data$Y_0)

matched<-matchit(treated ~ education ,data=data, 
                 method='exact',
                 estimand ='ATE'
                 )
mdat<-match.data(matched)

exact_match_fit <- lm(income ~ treated + education,
           data = mdat, weights = weights)

modelsummary(list('no controls'=ols, 
                  'ols + control' =ols_control, 
                  'exact matching' =exact_match_fit), 
             gof_map = c("nobs"),
             note = sprintf("actual ATE = %s", round(ate,digits=3))
             )
tinytable_ti26881cjlig4zn19scp
no controls ols + control exact matching
actual ATE = 1.871
(Intercept) 2.721 -5.448 -5.446
(1.187) (0.411) (0.335)
treatedTreated 12.322 -2.020 1.879
(1.259) (0.455) (0.304)
education 7.084 5.546
(0.080) (0.071)
Num.Obs. 1000 1000 862

Exact matching

In repeated simulations, the exact matching estimates are unbiased and have only slightly more variance than the actual average treatment effect:

set.seed(999)
fun<-function(){
  df <- dataFunction() # re-using the data generating function
  ATE <- mean(df$Y_1 - df$Y_0)
  model<-lm(income ~ education + treated, data=df)
  matched<-matchit(treated ~ education ,data=df, method='exact' ,estimand='ATE')

  mdat<-match.data(matched)

  fit1 <- lm(income ~ treated ,
           data = mdat, weights = weights)
  res<-tibble('regression estimate' = coef(model)[3], 
                  #'matched' = mean(result$difference), 
              "matched estimate"=coef(fit1)[2],
              "Average Treatment Effect" = ATE
              )
return(res)

}

results<-replicate(500, fun(), simplify = F)|>
  bind_rows()|>
  pivot_longer(cols=everything())|>
  mutate(value = unlist(value))

Exact matching

Exact matching: limitations

In practice, exact matching is not feasible for most data. Most real world data sets will have few, if any, exact matches on multiple characteristics. (think about trying to do this with a continuous covariate!)

Propensity score matching

Instead of matching on values, PSM matches observations with similar probabilities of receiving treatment.

  • Estimate a model to predict being in the treatment group
  • Group values with similar treatment probabilities
  • Reweight the model so that the treated and control groups are similar
  • Normally, discard observations outside of the “area of common support” where significant overlap exists.

Propensity score matching

matched<-matchit(treated ~ education ,data=df, 
                 method='nearest' ,
                 ratio = 2,
                 distance = 'glm')
mdat<-match.data(matched)
psm_fit <- lm(income ~ treated + education , data=mdat, weights = weights)

modelsummary(list('ols + control' =ols_control, 
                  "exact matching"=exact_match_fit,
                  "psm" = psm_fit
                  ), 
             gof_map = c("nobs")
             )
tinytable_1y09wl210ydv01mdqbk3
ols + control exact matching psm
(Intercept) -5.448 -5.446 -6.542
(0.411) (0.335) (0.660)
treatedTreated -2.020 1.879 -1.649
(0.455) (0.304) (1.966)
education 7.084 5.546 8.033
(0.080) (0.071) (0.357)
Num.Obs. 1000 862 222

Propensity scores

Propensity score matching

In practice propensity scores are often completely fine, but theoretically they can actually worsen bias if the treatment selection process isn’t modeled correctly. Other methods are more robust to this kind of mis-specification.

Coarsened Exact Matching

  • “Coarsen” the variables, prune unmatched observations (usually requires some user-input)
  • Conduct exact matching on the coarsened results
  • Reweight as needed to match groups

Coarsened Exact Matching

Coarsened Exact Matching

matched<-matchit(treated ~ education ,data=df, method='cem' ,
                 distance = 'glm', estimand = "ATE")
mdat<-match.data(matched)
cem_fit <- lm(income ~ treated + education , data=mdat, weights = weights)

modelsummary(list('ols + control' =ols_control, 
                  "exact matching"=exact_match_fit,
                  "psm" = psm_fit,
                  "cem" = cem_fit
                  ), 
             gof_map = c("nobs"),
             note = sprintf("actual ATE = %s", round(ate,digits=3))

             )
tinytable_fw5w84rdpya4f8ebd7l8
ols + control exact matching psm cem
actual ATE = 1.871
(Intercept) -5.448 -5.446 -6.542 -5.446
(0.411) (0.335) (0.660) (0.335)
treatedTreated -2.020 1.879 -1.649 1.879
(0.455) (0.304) (1.966) (0.304)
education 7.084 5.546 8.033 5.546
(0.080) (0.071) (0.357) (0.071)
Num.Obs. 1000 862 222 862

Standard errors

Methods like CEM will generally group variables together before weighting, so we would expect to see clustering within the strata, but we know how to deal with this!


 Estimate Std. Error  z Pr(>|z|)     S 2.5 % 97.5 %
     1.88      0.157 12   <0.001 107.5  1.57   2.19

Term: treated
Type:  response 
Comparison: Treated - Control
Columns: term, contrast, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, predicted_lo, predicted_hi, predicted 

Takeaways

  • Matching can address some of the limitations of regression models and simulate experimental conditions.

  • Main advantages over regression are:

    • No assumption of a linear relationship between confounders
    • More informally: allows researchers to separate out the treatment selection process from the treatment effect process.